Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Robert Haas
On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote: The WALSender

Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Simon Riggs
On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote: On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 2, 2011 at 8:22 PM,

Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-06 Thread Simon Riggs
On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote: The WALSender deliberately does *not* wake waiting users if the standby disconnects.

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Simon Riggs
On Thu, 2011-03-03 at 02:14 -0500, Tom Lane wrote: Fujii Masao masao.fu...@gmail.com writes: On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes: I don't understand how synchronous replication with allow_standalone_primary=on gives you ANY extra nines. AFAICS, the only point of having synchronous replication is that you wait to acknowledge the commit to the client until the commit record has

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Robert Haas
On Wed, Mar 2, 2011 at 5:10 PM, Yeb Havinga yebhavi...@gmail.com wrote: On 2011-03-02 21:26, Kevin Grittner wrote: I think including synchronous is OK as long as it's properly qualified.  Off-hand thoughts in no particular order: semi-synchronous conditionally synchronous synchronous with

How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Fujii Masao
On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote: The WALSender deliberately does *not* wake waiting users if the standby disconnects. Doing so would break the whole reason for having sync rep in

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: No, I've never wished wait-forever option for now. I'd like to make the primary work alone when there is no connected standby, for high-availability. allow_standalone_primary seems to need to be better through than it is now, yet neither

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga
On 2011-03-02 11:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then it might make this release, but I propose to remove it from this

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 6:22 AM, Simon Riggs si...@2ndquadrant.com wrote: The WALSender deliberately does *not* wake waiting users if the standby disconnects. Doing so would break the whole reason for having sync rep in the first place. What we do is allow a potential standby to takeover the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote: The WALSender deliberately does *not* wake waiting users if the standby disconnects. Doing so would break the whole reason for having sync rep in the first place. What we do is allow a potential standby to takeover the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 7:40 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: No, I've never wished wait-forever option for now. I'd like to make the primary work alone when there is no connected standby, for high-availability.

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then it might make this release, but I propose to remove it from this

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao masao.fu...@gmail.com wrote: What I'm thinking is: when the waiting backends are released because of the timeout while the fast shutdown is being done in the master, those backends should not return the success indication to the client. Of course, in

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 17:07, Robert Haas wrote: On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masaomasao.fu...@gmail.com wrote: What I'm thinking is: when the waiting backends are released because of the timeout while the fast shutdown is being done in the master, those backends should not return the success

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 17:07, Robert Haas wrote: On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having.

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing whether the transaction committed or not. +1 It might be reasonable to COMMIT but also

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The defining property of synchronous replication is that when a transaction is acknowledged as committed to the client, it has also been replicated to the standby. You don't achieve that with allow_standalone_primary=on, plain

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Aidan Van Dyk
On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao masao.fu...@gmail.com wrote: 1. The primary is running with allow_standalone_primary = on. There    is only one (synchronous) standby connected. OK. Explicitly configured to allow the master to report as commited stuff which isn't on a/any slave.

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Merlin Moncure
On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing whether the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 12:39 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: To achieve the effect Fujii is looking for, we would have to silently drop

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 17:23 +0200, Heikki Linnakangas wrote: On 02.03.2011 17:07, Robert Haas wrote: On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 21:48, Simon Riggs wrote: On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: All I'm saying is that if we end up shipping without that parameter (implying allow_standalone_primary=on), we need to call the feature something else. The GUCs and code can probably stay as it is, but we shouldn't use the term

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Joshua D. Drake
On Wed, 2011-03-02 at 14:26 -0600, Kevin Grittner wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: All I'm saying is that if we end up shipping without that parameter (implying allow_standalone_primary=on), we need to call the feature something else. The GUCs and code

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: On 02.03.2011 21:48, Simon Riggs wrote: On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan
On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly synchronous requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called allow_standalone_primary should have no bearing on what we call this feature. I haven't been following this very

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote: allow_standalone_primary=off means wait forever. It does nothing to reduce data loss since you can't replicate to a server that isn't there. Unless you're pulling from some persistent source which will then feel free to discard what you have

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 3:45 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: allow_standalone_primary=off means wait forever. It does nothing to reduce data loss since you can't replicate to a server that isn't there. Unless you're pulling from

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: I assumed that when Simon was talking about removing allow_standalone_primary, he meant making the code always behave as if it were turned OFF. That is the part that is currently not fully specified, so no that is not currently included

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: I assumed that when Simon was talking about removing allow_standalone_primary, he meant making the code always behave as if it were turned OFF. That is the part that is currently not fully

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly synchronous requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called allow_standalone_primary should have no bearing on

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: Fair enough. All I'm saying is that if we end up shipping without that parameter (implying allow_standalone_primary=on), we need to call the feature something else. The GUCs and code can

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes: 1. Everything is humming along. 2. The network link between the master and standby drops. 3. Then it comes back up again. After (2) and before (3), what should the behavior the master be? It seems clear to me that it should WAIT. Otherwise, a crash

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan
On 03/02/2011 04:13 PM, Simon Riggs wrote: On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly synchronous requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: 1. Everything is humming along. 2. The network link between the master and standby drops. 3. Then it comes back up again. After (2) and before (3), what should the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga
On 2011-03-02 21:26, Kevin Grittner wrote: I think including synchronous is OK as long as it's properly qualified. Off-hand thoughts in no particular order: semi-synchronous conditionally synchronous synchronous with automatic failover to standalone It would be good to name the concept equal

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:16 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: Fair enough. All I'm saying is that if we end up shipping without that parameter (implying allow_standalone_primary=on), we need to call

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Yeb Havinga yebhavi...@gmail.com wrote: On 2011-03-02 21:26, Kevin Grittner wrote: I think including synchronous is OK as long as it's properly qualified. Off-hand thoughts in no particular order: semi-synchronous conditionally synchronous synchronous with automatic failover to standalone

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:24 -0500, Andrew Dunstan wrote: On 03/02/2011 04:13 PM, Simon Riggs wrote: On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly synchronous requires two-phase commit, which this never was. So the absence or

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 5:50 AM, Robert Haas robertmh...@gmail.com wrote: I agree.  I assumed that when Simon was talking about removing allow_standalone_primary, he meant making the code always behave as if it were turned OFF. I feel the same thing.. Despite his saying, the patch implements

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 6:33 AM, Robert Haas robertmh...@gmail.com wrote: I don't understand how synchronous replication with allow_standalone_primary=on gives you ANY extra nines. When you start the primary (or when there is one connected standby and it crashes), allow_standalone_primary = on

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing whether the transaction committed or not. Yeah, this seems

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Thu, 2011-03-03 at 13:35 +0900, Fujii Masao wrote: On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes: On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing whether the

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: Thanks for update of the patch! On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote: SyncRepRemoveFromQueue seems not to be as short-term as we can use the spinlock. Instead, LW lock should be used there. You seem

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 16:28 +0900, Fujii Masao wrote: On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs si...@2ndquadrant.com wrote: PREPARE TRANSACTION and ROLLBACK PREPARED should wait for replication as well as COMMIT PREPARED? PREPARE - Yes ROLLBACK - No Further discussion welcome

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs si...@2ndquadrant.com wrote: A spinlock can be used only for very short-term operation like read/write of some shared-variables. The operation on the queue is not short, so should be protected by LWLock, I think. There's no need to sleep while

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 4:56 PM, Simon Riggs si...@2ndquadrant.com wrote: If allow_standalone_primary = on then we sit and wait until we hit client timeout, which occurs even after last standby has gone. In that case, why do backends need to wait until the timeout occurs? We can make those

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs si...@2ndquadrant.com wrote: What if fast shutdown is requested while RecordTransactionCommit is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete until replication has been successfully done (i.e., until at least one synchronous

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Yeb Havinga
On 2011-02-28 20:32, Simon Riggs wrote: I have reworded it to see if that improves the explanation Code available at git://github.com/simon2ndQuadrant/postgres.git If a standby is removed from the list of servers then it will stop being the synchronous standby, allowing another to take

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Robert Haas
On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: Thanks for update of the patch! On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote: SyncRepRemoveFromQueue seems not to be as short-term as we

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: A spinlock can be used only for very short-term operation like read/write of some shared-variables. The operation on the queue is not short, so should be protected by LWLock, I think. There's no

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs si...@2ndquadrant.com wrote: LWlocks are just spinlocks plus sem sleeps, so I don't see the need for that in the current code. Other views welcome. An LWLock is a lot safer, in general, than a spinlock. A

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: A spinlock can be used only for very short-term operation like read/write of some shared-variables. The operation on the queue is not short,

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Yeb Havinga
On 2011-02-25 20:40, Jaime Casanova wrote: On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havingayebhavi...@gmail.com wrote: I also did some initial testing on this patch and got the queue related errors with 1 clients. With the code change from Jaime above I still got a lot of 'not on queue

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 18:06 +0900, Fujii Masao wrote: Thanks for the patch! Thanks for the review. Code available at git://github.com/simon2ndQuadrant/postgres.git PREPARE TRANSACTION and ROLLBACK PREPARED should wait for replication as well as COMMIT PREPARED? PREPARE - Yes ROLLBACK - No

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote: 1) no automatic switch to other synchronous standby - start master server, add synchronous standby 1 - change allow_standalone_primary to off - add second synchronous standby - wait until pg_stat_replication shows both standby's are in

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote: On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao masao.fu...@gmail.com wrote: I've read about two-tenths of the patch, so I'll submit another comments about the rest later. Sorry for the slow reviewing... Here are another comments: Thanks

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote: On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote: I've read about a tenth of the patch, so I'll submit another comments about the rest later. Here are another comments: Thanks for your comments Code available at

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void) } else { + bool found = false; + while

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote: I have also reproduced this. Notably, things seem fine as long as pgbench is confined to one backend, but as soon as two are used (-c 2) by the feature I can get segfaults. Sorry that you all experienced this. I wasn't able to get

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 21:35 +0900, Tatsuo Ishii wrote: Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. The patch is very lite touch on a few areas of

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: I needed to add this to make the documentation compile --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; You should also consider setting

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote: On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote: First, we should be clear to explain that you are referring to the fact that the request synchronous_commit = off synchronous_replication = on makes no sense

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: I believe the problem is that the definition of IsOnSyncRepQueue is bogus, so that the loop in SyncRepWaitOnQueue always takes the first branch. Sorry, don't see that. Jaime/Yeb fix applied. It was a little confusing to me setting this

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: I believe the problem is that the definition of IsOnSyncRepQueue is bogus, so that the loop in SyncRepWaitOnQueue always takes the first branch. Sorry, don't see

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: Docs changed: git://github.com/simon2ndQuadrant/postgres.git I'm hoping you're going to post an updated patch once the current rash of updates is all done. Immediately prior to commit, yes. Everybody else has been nudging me towards

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: Docs changed: git://github.com/simon2ndQuadrant/postgres.git I'm hoping you're going to post an updated patch once the current rash of updates is all done.

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote: On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: Docs changed: git://github.com/simon2ndQuadrant/postgres.git I'm hoping you're going to post an updated

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a transaction is waiting for replication, it waits infinitely. Will think on this. The behaviour seems correct to me: If

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a transaction is waiting for replication, it waits infinitely.

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
Thanks for update of the patch! On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote: SyncRepRemoveFromQueue seems not to be as short-term as we can use the spinlock. Instead, LW lock should be used there. You seem to have forgotten to fix the above-mentioned issue. A

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs si...@2ndquadrant.com wrote: PREPARE TRANSACTION and ROLLBACK PREPARED should wait for replication as well as COMMIT PREPARED? PREPARE - Yes ROLLBACK - No Further discussion welcome If we don't make ROLLBACK PREPARED wait for replication, we

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Yeb Havinga
On 2011-02-22 20:43, Jaime Casanova wrote: you can make this happen more easily, i just run pgbench -n -c10 -j10 test and qot that warning and sometimes a segmentation fault and sometimes a failed assertion and the problematic code starts at src/backend/replication/syncrep.c:277, here my

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jaime Casanova
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga yebhavi...@gmail.com wrote: I also did some initial testing on this patch and got the queue related errors with 1 clients. With the code change from Jaime above I still got a lot of 'not on queue warnings'. I tried to understand how the queue

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jeff Davis
On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: Oh, yes, this reproduces past shutdowns/startups, and there's quite a few txids before I catch up. I'm also comfortable poking around with gdb (I have already recompiled with debugging symbols and optimizations off and was poking around,

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Daniel Farina
On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis pg...@j-davis.com wrote: On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: Oh, yes, this reproduces past shutdowns/startups, and there's quite a few txids before I catch up. I'm also comfortable poking around with gdb (I have already recompiled

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Fujii Masao
On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao masao.fu...@gmail.com wrote: I've read about two-tenths of the patch, so I'll submit another comments about the rest later. Sorry for the slow reviewing... Here are another comments: + {synchronous_standby_names, PGC_SIGHUP,

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova ja...@2ndquadrant.com wrote: On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas robertmh...@gmail.com wrote: DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690 DEBUG:  released 0 procs up to 0/3014690 DEBUG:  write 0/3027BC8 flush 0/3027BC8

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
With some more fooling around, I have also managed to get this elog(WARNING) if (proc-lwWaitLink == NULL) elog(WARNING, could not locate ourselves on wait queue); -- fdr -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Jaime Casanova
On Mon, Feb 21, 2011 at 4:06 AM, Fujii Masao masao.fu...@gmail.com wrote: PREPARE TRANSACTION and ROLLBACK PREPARED should wait for replication as well as COMMIT PREPARED? maybe ROLLBACK PREPARED but i'm not sure... i'm pretty sure we don't need to wait for PREPARE TRANSACTION, but i could be

Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote: Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. I've been messing with this patch and

Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Wed, Feb 23, 2011 at 10:39 PM, Daniel Farina dan...@heroku.com wrote: On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote: Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with.

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Marti Raudsepp
On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote: +       SpinLockAcquire(WalSndCtl-ctlmutex); +       result = WalSndCtl-sync_rep_service_available; +       SpinLockRelease(WalSndCtl-ctlmutex); volatile pointer needs to be used to prevent code rearrangement. I don't

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Andres Freund
On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote: On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote: + SpinLockAcquire(WalSndCtl-ctlmutex); + result = WalSndCtl-sync_rep_service_available; + SpinLockRelease(WalSndCtl-ctlmutex); volatile

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Greg Smith
Daniel Farina wrote: As it will be somewhat hard to prove the durability guarantees of commit without special heroics, unless someone can suggest a mechanism. Could you introduce a hack creating deterministic server side crashes in order to test this out? The simplest thing that comes to

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes: On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote: + SpinLockAcquire(WalSndCtl-ctlmutex); + result = WalSndCtl-sync_rep_service_available; + SpinLockRelease(WalSndCtl-ctlmutex); volatile pointer needs to be used to

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Jaime Casanova
On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas robertmh...@gmail.com wrote: DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690 DEBUG:  released 0 procs up to 0/3014690 DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 DEBUG:  released 2 procs up to 0/3027BC8 WARNING:  could not

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Daniel Farina
On Tue, Feb 22, 2011 at 5:04 AM, Greg Smith g...@2ndquadrant.com wrote: Daniel Farina wrote: As it will be somewhat hard to prove the durability guarantees of commit without special heroics, unless someone can suggest a mechanism. Could you introduce a hack creating deterministic server

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote: Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. The patch is very lite touch on a few

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Tatsuo Ishii
Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. The patch is very lite touch on a few areas of code, plus a chunk of specific code, all on master-side. Pretty

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Daniel Farina
On Mon, Feb 21, 2011 at 4:35 AM, Tatsuo Ishii is...@postgresql.org wrote: Well, good news all round. Hello on this thread, I'm taking a look at replication timeout with non-blocking which would be nice but not required for this patch, in my understanding. But before that, we're going to put

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Tue, Feb 22, 2011 at 7:55 AM, Daniel Farina dan...@heroku.com wrote: I'm taking a look at replication timeout with non-blocking which would be nice but not required for this patch, in my understanding. Why do you think so? You think sync_replication_timeout_client is sufficient for sync rep?

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote: I've read about a tenth of the patch, so I'll submit another comments about the rest later. Here are another comments: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 9:35 PM, Tatsuo Ishii is...@postgresql.org wrote: +       primaryvarnamesynchronous_standby_names/ configuration parameter/primary +      /indexterm +      listitem +       para +        Specifies a list of standby names that can become the sole +        synchronous

  1   2   >