Re: [HACKERS] Issues with Quorum Commit

2010-10-20 Thread Bruce Momjian
Tom Lane wrote: Greg Smith g...@2ndquadrant.com writes: I don't see this as needing any implementation any more complicated than the usual way such timeouts are handled. Note how long you've been trying to reach the standby. Default to -1 for forever. And if you hit the timeout,

Re: [HACKERS] Issues with Quorum Commit

2010-10-14 Thread Greg Stark
On Tue, Oct 12, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote: There's another problem here we should think about, too.  Suppose you have a master and two standbys.  The master dies.  You promote one of the standbys, which turns out to be behind the other.  You then repoint the

Re: [HACKERS] Issues with Quorum Commit

2010-10-14 Thread Robert Haas
On Wed, Oct 13, 2010 at 5:22 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas robertmh...@gmail.com wrote: There's another problem here we should think about, too.  Suppose you have a master and two standbys.  The master dies.  You promote one of the

Re: [HACKERS] Issues with Quorum Commit

2010-10-14 Thread Fujii Masao
On Thu, Oct 14, 2010 at 11:18 AM, Greg Stark gsst...@mit.edu wrote: Why don't the usual protections kick in here? The new record read from the location the xlog reader is expecting to find it has to have a valid CRC and a correct back pointer to the previous record. Yep. In most cases, those

Re: [HACKERS] Issues with Quorum Commit

2010-10-14 Thread Robert Haas
On Wed, Oct 13, 2010 at 10:18 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Oct 12, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote: There's another problem here we should think about, too.  Suppose you have a master and two standbys.  The master dies.  You promote one of the

Re: [HACKERS] Issues with Quorum Commit

2010-10-13 Thread Heikki Linnakangas
On 13.10.2010 08:21, Fujii Masao wrote: On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It shouldn't be too hard to fix. Walsender needs to be able to read WAL from preceding timelines, like recovery does, and walreceiver needs to write the

Re: [HACKERS] Issues with Quorum Commit

2010-10-13 Thread Robert Haas
On Wed, Oct 13, 2010 at 2:43 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.10.2010 08:21, Fujii Masao wrote: On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: It shouldn't be too hard to fix. Walsender needs to be able to

Re: [HACKERS] Issues with Quorum Commit

2010-10-13 Thread Markus Wanner
On 10/13/2010 06:43 AM, Fujii Masao wrote: Unfortunately even enough standbys don't increase write-availability unless you choose wait-forever. Because, after promoting one of standbys to new master, you must keep all the transactions waiting until at least one standby has connected to and

Re: [HACKERS] Issues with Quorum Commit

2010-10-13 Thread Fujii Masao
On Wed, Oct 13, 2010 at 3:43 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.10.2010 08:21, Fujii Masao wrote: On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: It shouldn't be too hard to fix. Walsender needs to be able to

Re: [HACKERS] Issues with Quorum Commit

2010-10-13 Thread Fujii Masao
On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas robertmh...@gmail.com wrote: There's another problem here we should think about, too.  Suppose you have a master and two standbys.  The master dies.  You promote one of the standbys, which turns out to be behind the other.  You then repoint the

Re: [HACKERS] Issues with Quorum Commit

2010-10-12 Thread Fujii Masao
On Sat, Oct 9, 2010 at 12:12 AM, Markus Wanner mar...@bluegap.ch wrote: On 10/08/2010 04:48 PM, Fujii Masao wrote: I believe many systems require write-availability. Sure. Make sure you have enough standbies to fail over to. Unfortunately even enough standbys don't increase write-availability

Re: [HACKERS] Issues with Quorum Commit

2010-10-12 Thread Fujii Masao
On Sat, Oct 9, 2010 at 1:41 AM, Josh Berkus j...@agliodbs.com wrote: And, I'd like to know whether the master waits forever because of the standby failure in other solutions such as Oracle DataGuard, MySQL semi-synchronous replication. MySQL used to be fond of simiply failing sliently.  Not

Re: [HACKERS] Issues with Quorum Commit

2010-10-12 Thread Fujii Masao
On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes. But if there is no unsent WAL when the master goes down, we can start new standby without new backup by copying the timeline history file from new master to new standby and setting

Re: [HACKERS] Issues with Quorum Commit

2010-10-11 Thread Markus Wanner
Greg, to me it looks like we have very similar goals, but start from different preconditions. I absolutely agree with you given the preconditions you named. On 10/08/2010 10:04 PM, Greg Smith wrote: How is that a new problem? It's already possible to end up with a standby pair that has

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Dimitri Fontaine
Greg Smith g...@2ndquadrant.com writes: […] I don't see this as needing any implementation any more complicated than the usual way such timeouts are handled. Note how long you've been trying to reach the standby. Default to -1 for forever. And if you hit the timeout, mark the standby as

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 12:30 AM, Simon Riggs wrote: I do, but its not a parameter. The k = 1 behaviour is hardcoded and considerably simplifies the design. Moving to k 1 is additional work, slows things down and seems likely to be fragile. Perfect! So I'm all in favor of committing that, but leaving

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
Simon, On 10/08/2010 12:25 AM, Simon Riggs wrote: Asking for k 1 does *not* mean those servers are time synchronised. Yes, it's technically impossible to create a fully synchronized cluster (on the basis of shared-nothing nodes we are aiming for, that is). There always is some kind of lag on

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Heikki Linnakangas
On 07.10.2010 21:38, Markus Wanner wrote: On 10/07/2010 03:19 PM, Dimitri Fontaine wrote: I think you're all into durability, and that's good. The extra cost is service downtime It's just *reduced* availability. That doesn't necessarily mean downtime, if you combine cleverly with async

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 04:01 AM, Fujii Masao wrote: Really? I don't think that ko-count=0 means wait-forever. Telling from the documentation, I'd also say it doesn't wait forever by default. However, please note that there are different parameters for the initial wait for connection during boot up

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Heikki Linnakangas
On 08.10.2010 06:41, Fujii Masao wrote: On Thu, Oct 7, 2010 at 3:01 AM, Markus Wannermar...@bluegap.ch wrote: Of course, it doesn't make sense to wait-forever on *every* standby that ever gets added. Quorum commit is required, yes (and that's what this thread is about, IIRC). But with quorum

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 09:52 +0200, Markus Wanner wrote: One addendum: a timeout increases availability at the cost of increased danger of data loss and higher complexity. Don't use it, just increase (N - k) instead. Completely agree. -- Simon Riggs www.2ndQuadrant.com

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 05:41 AM, Fujii Masao wrote: But, even with quorum commit, if you choose wait-forever option, failover would decrease availability. Right after the failover, no standby has connected to new master, so if quorum = 1, all the transactions must wait for a while. That's a point,

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Heikki Linnakangas
On 08.10.2010 01:25, Simon Riggs wrote: On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition).

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote: Or what kind of customers do you think really need a no-lag solution for read-only queries? In the LAN case, the lag of async rep is negligible and in the WAN case the latencies of sync rep are prohibitive. There is a very

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Heikki Linnakangas
On 08.10.2010 11:25, Simon Riggs wrote: On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote: Or what kind of customers do you think really need a no-lag solution for read-only queries? In the LAN case, the lag of async rep is negligible and in the WAN case the latencies of sync rep are

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 10:27 AM, Heikki Linnakangas wrote: Synchronous replication in the 'replay' mode is supposed to guarantee exactly that, no? The master may lag behind, so it's not strictly speaking the same data. Regards Markus Wanner -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 09:56 AM, Heikki Linnakangas wrote: Imagine a web application that's mostly read-only, but a user can modify his own personal details like name and address, for example. Imagine that the user changes his street address and clicks 'save', causing an UPDATE, and the next query

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 11:27 +0300, Heikki Linnakangas wrote: On 08.10.2010 11:25, Simon Riggs wrote: On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote: Or what kind of customers do you think really need a no-lag solution for read-only queries? In the LAN case, the lag of async

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 01:44 AM, Greg Smith wrote: They'll use Sync Rep to maximize the odds a system failure doesn't cause any transaction loss. They'll use good quality hardware on the master so it's unlikely to fail. ..unlikely to fail? Ehm.. is that you speaking, Greg? ;-) But when the

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 11:00 AM, Simon Riggs wrote: From the perspective of an observer, randomly selecting a standby for load balancing purposes: No, they are not guaranteed to see the latest answer, nor even can they find out whether what they are seeing is the latest answer. I completely agree. The

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes: ..and how do you make sure you are not marking your second standby as degraded just because it's currently lagging? Well, in sync rep, a standby that's not able to stay under the timeout is degraded. Full stop. The presence of the timeout (or its value

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 11:41 AM, Dimitri Fontaine wrote: Same old story. Either you're able to try and fix the master so that you don't lose any data and don't even have to check for that, or you take a risk and start from a non synced standby. It's all availability against durability again. ..and a

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes: ..and a whole lot of manual work, that's prone to error for something that could easily be automated So, the master just crashed, first standby is dead and second ain't in sync. What's the easy and automated way out? Sorry, I need a hand here. --

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Fujii Masao
On Fri, Oct 8, 2010 at 5:07 PM, Markus Wanner mar...@bluegap.ch wrote: On 10/08/2010 04:01 AM, Fujii Masao wrote: Really? I don't think that ko-count=0 means wait-forever. Telling from the documentation, I'd also say it doesn't wait forever by default. However, please note that there are

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: I don't see this as needing any implementation any more complicated than the usual way such timeouts are handled. Note how long you've been trying to reach the standby. Default to -1 for forever. And if you hit the timeout, mark the standby as

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Fujii Masao
On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Do we really need that? Yes. But if there is no unsent WAL when the master goes down, we can start new standby without new backup by copying the timeline history file from new master to new standby and

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 04:11 PM, Tom Lane wrote: Actually, #2 seems rather difficult even if you want it. Presumably you'd like to keep that state in reliable storage, so it survives master crashes. But how you gonna commit a change to that state, if you just lost every standby (suppose master's

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes: Well, actually, that's *considerably* more complicated than just a timeout. How are you going to mark the standby as degraded? The standby can't keep that information, because it's not even connected when the master makes the decision. ISTM that this

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 12:05 PM, Dimitri Fontaine wrote: Markus Wanner mar...@bluegap.ch writes: ..and a whole lot of manual work, that's prone to error for something that could easily be automated So, the master just crashed, first standby is dead and second ain't in sync. What's the easy and

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes: On 10/08/2010 04:11 PM, Tom Lane wrote: Actually, #2 seems rather difficult even if you want it. Presumably you'd like to keep that state in reliable storage, so it survives master crashes. But how you gonna commit a change to that state, if you just

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 04:38 PM, Tom Lane wrote: Markus Wanner mar...@bluegap.ch writes: IIUC you seem to assume that the master node keeps its master role. But users who value availability a lot certainly want automatic fail-over, Huh? Surely loss of the slaves shouldn't force a failover. Maybe

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 10:11 -0400, Tom Lane wrote: 1. a unique identifier for each standby (not just role names that multiple standbys might share); That is difficult because each standby is identical. If a standby goes down, people can regenerate a new standby by taking a copy from another

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Fujii Masao
On Fri, Oct 8, 2010 at 5:16 PM, Markus Wanner mar...@bluegap.ch wrote: On 10/08/2010 05:41 AM, Fujii Masao wrote: But, even with quorum commit, if you choose wait-forever option, failover would decrease availability. Right after the failover, no standby has connected to new master, so if

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Fujii Masao
On Fri, Oct 8, 2010 at 6:00 PM, Simon Riggs si...@2ndquadrant.com wrote: From the perspective of an observer, randomly selecting a standby for load balancing purposes: No, they are not guaranteed to see the latest answer, nor even can they find out whether what they are seeing is the latest

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 23:55 +0900, Fujii Masao wrote: On Fri, Oct 8, 2010 at 6:00 PM, Simon Riggs si...@2ndquadrant.com wrote: From the perspective of an observer, randomly selecting a standby for load balancing purposes: No, they are not guaranteed to see the latest answer, nor even can

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 04:47 PM, Simon Riggs wrote: Yes, I really want to avoid such issues and likely complexities we get into trying to solve them. In reality they should not be common because it only happens if the sysadmin has not configured sufficient number of redundant standbys. Well, full

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Markus Wanner
On 10/08/2010 04:48 PM, Fujii Masao wrote: I believe many systems require write-availability. Sure. Make sure you have enough standbies to fail over to. (I think there are even more situations where read-availability is much more important, though). Start with 0 (i.e. replication off), then

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Josh Berkus
And, I'd like to know whether the master waits forever because of the standby failure in other solutions such as Oracle DataGuard, MySQL semi-synchronous replication. MySQL used to be fond of simiply failing sliently. Not sure what 5.4 does, or Oracle. In any case MySQL's replication has

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Rob Wultsch
* On 10/8/10, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Do we really need that? Yes. But if there is no unsent WAL when the master goes down, we can start new standby without new backup by copying

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Heikki Linnakangas
On 08.10.2010 17:26, Fujii Masao wrote: On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Do we really need that? Yes. But if there is no unsent WAL when the master goes down, we can start new standby without new backup by copying the timeline

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Greg Smith
Markus Wanner wrote: ..and how do you make sure you are not marking your second standby as degraded just because it's currently lagging? Effectively degrading the utterly needed one, because your first standby has just bitten the dust? People are going to monitor the standby lag. If it

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Greg Smith
Tom Lane wrote: How are you going to mark the standby as degraded? The standby can't keep that information, because it's not even connected when the master makes the decision. From a high level, I'm assuming only that the master has a list in memory of the standby system(s) it believes are

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 17:06 +0200, Markus Wanner wrote: Well, full cluster outages are infrequent, but sadly cannot be avoided entirely. (Murphy's laughing). IMO we should be prepared to deal with those. I've described how I propose to deal with those. I'm not waving away these issues, just

Re: [HACKERS] Issues with Quorum Commit

2010-10-08 Thread Simon Riggs
On Fri, 2010-10-08 at 16:34 -0400, Greg Smith wrote: Tom Lane wrote: How are you going to mark the standby as degraded? The standby can't keep that information, because it's not even connected when the master makes the decision. From a high level, I'm assuming only that the master has

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Simon Riggs
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote: I also strongly believe that we should get single-standby functionality committed and tested *first*, before working further on multi-standby. Yes, lets get k = 1 first. With k = 1 the number of standbys is not limited, so we can still

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Simon Riggs
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote: (2), (3) Degradation: (Jeff) these two cases make sense only if we give DBAs the tools they need to monitor which standbys are falling behind, and to drop and replace those standbys. Otherwise we risk giving DBAs false confidence that

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
On 10/06/2010 10:01 PM, Simon Riggs wrote: The code to implement your desired option is more complex and really should come later. I'm sorry, but I think of that exactly the opposite way. The timeout for automatic continuation after waiting for a standby is the addition. The wait state of the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes: I'm just saying that this should be an option, not the only choice. I'm sorry, I just don't see the use case for a mode that drops guarantees when they are most needed. People who don't need those guarantees should definitely go for async replication

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Heikki Linnakangas
On 07.10.2010 12:52, Dimitri Fontaine wrote: Markus Wannermar...@bluegap.ch writes: I'm just saying that this should be an option, not the only choice. I'm sorry, I just don't see the use case for a mode that drops guarantees when they are most needed. People who don't need those guarantees

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Either that, or you configure your system for asynchronous replication first, and flip the switch to synchronous only after the standby has caught up. Setting up the first standby happens only once when you initially set up the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Simon Riggs
On Thu, 2010-10-07 at 11:46 +0200, Markus Wanner wrote: On 10/06/2010 10:01 PM, Simon Riggs wrote: The code to implement your desired option is more complex and really should come later. I'm sorry, but I think of that exactly the opposite way. I see why you say that. Dimitri's suggestion

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
On 10/07/2010 01:08 PM, Simon Riggs wrote: Adding timeout is very little code. We can take that out of the patch if that's an objection. Okay. If you take it out, we are at the wait-forever option, right? If not, I definitely don't understand how you envision things to happen. I've been asking

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Robert Haas
On Thu, Oct 7, 2010 at 3:30 AM, Simon Riggs si...@2ndquadrant.com wrote: Yes, lets get k = 1 first. With k = 1 the number of standbys is not limited, so we can still have very robust and highly available architectures. So we mean first-acknowledgement-releases-waiters. +1. I like the design

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
Salut Dimitri, On 10/07/2010 12:32 PM, Dimitri Fontaine wrote: Another one is to say that I want sync rep when the standby is available, but I don't have the budget for more. So I prefer a good alerting system and low-budget-no-guarantee when the standby is down, that's my risk evaluation. I

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes: Why does one ever want the guarantee that sync replication gives to only hold true up to one failure, if a better guarantee doesn't cost anything extra? (Note that a good alerting system is impossible to achieve with only two servers. You need a third

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Aidan Van Dyk
On Thu, Oct 7, 2010 at 6:32 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Or if the standby is lagging and the master wal_keep_segments is not sized big enough. Is that a catastrophic loss of the standby too? Sure, but that lagged standy is already asynchrounous, not synchrounous. If it

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Aidan Van Dyk ai...@highrise.ca writes: Sure, but that lagged standy is already asynchrounous, not synchrounous. If it was synchronous, it would have slowed the master down enough it would not be lagged. Agreed, except in the case of a joining standby. But you're saying it better than I do:

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Aidan Van Dyk
On Thu, Oct 7, 2010 at 10:08 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Aidan Van Dyk ai...@highrise.ca writes: Sure, but that lagged standy is already asynchrounous, not synchrounous.  If it was synchronous, it would have slowed the master down enough it would not be lagged. Agreed,

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Aidan Van Dyk ai...@highrise.ca writes: *shrug* The joining standby is still asynchronous at this point. It's not synchronous replication. It's just another ^k of the N slaves serving stale data ;-) Agreed *here*, but if you read the threads again, you'll see that's not at all what's been

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Greg Smith
Markus Wanner wrote: I think that's a pretty special case, because the good alerting system is at least as expensive as another server that just persistently stores and ACKs incoming WAL. The cost of hardware capable of running a database server is a large multiple of what you can build an

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Josh Berkus
On 10/7/10 6:41 AM, Aidan Van Dyk wrote: I'm really confused with all this k N scenarious I see bandied about, because, all it really amounts to is I only want *one* syncronous replication, and a bunch of synchrounous replications. And a bit of chance thrown in the mix to hope the syncronous

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Aidan Van Dyk
On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: So if you have k = 3 and N = 10, then you can have 10 standbys and only 3 of them need to ack any specific commit for the master to proceed. As long as (a) you retain at least one of the 3 which ack'd, and (b) you have some

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Josh Berkus
If you want synchronous replication because you want query availabilty while making sure you're not getting stale queries from all your slaves, than using your k N (k = 3 and N - 10) situation is screwing your self. Correct. If that is your reason for synch standby, then you should be using

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
On 10/07/2010 06:41 PM, Greg Smith wrote: The cost of hardware capable of running a database server is a large multiple of what you can build an alerting machine for. You realize you don't need lots of disks nor RAM for a box that only ACKs? A box with two SAS disks and a BBU isn't that

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Josh Berkus
But as a practical matter, I'm afraid the true cost of the better guarantee you're suggesting here is additional code complexity that will likely cause this feature to miss 9.1 altogether. As far as I'm concerned, this whole diversion into the topic of quorum commit is only consuming

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Kevin Grittner
Aidan Van Dyk ai...@highrise.ca wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up slave when the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Robert Haas
On Thu, Oct 7, 2010 at 2:10 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aidan Van Dyk ai...@highrise.ca wrote: To get non-stale responses, you can only query those k=3 servers.  But you've shot your self in the foot because you don't know which 3/10 those will be.  The other 7 *are*

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point in time doesn't jump around from one request to the next. When we have used

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
On 10/07/2010 03:19 PM, Dimitri Fontaine wrote: I think you're all into durability, and that's good. The extra cost is service downtime It's just *reduced* availability. That doesn't necessarily mean downtime, if you combine cleverly with async replication. if that's not what you're after:

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Markus Wanner
On 10/07/2010 07:44 PM, Aidan Van Dyk wrote: The only case I see a race to quorum type of k N being useful is if you're just trying to duplicate data everywhere, but not actually querying any of the replicas. I can see that all queries go to the master, but the chances are pretty high the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Robert Haas
On Thu, Oct 7, 2010 at 2:31 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes: I don't buy that. The risk calculation gets a lot simpler and obvious with strict guarantees. Ok, I'm lost in the use cases and analysis. I still don't understand why you want to consider the system already synchronous when it's not, whatever is the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: Establishing an affinity between a session and one of the database servers will only help if the traffic is strictly read-only. Thanks; I now see your point. In our environment, that's pretty common. Our most heavily used web app (the one for which

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Simon Riggs
On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Simon Riggs
On Thu, 2010-10-07 at 19:50 +0200, Markus Wanner wrote: So far I've been under the impression that Simon already has the code for quorum_commit k = 1. I do, but its not a parameter. The k = 1 behaviour is hardcoded and considerably simplifies the design. Moving to k 1 is additional work,

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Josh Berkus
All, Establishing an affinity between a session and one of the database servers will only help if the traffic is strictly read-only. I think this thread has drifted very far away from anything we're going to do for 9.1. And seems to have little to do with synchronous replication. Synch rep

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Greg Smith
Markus Wanner wrote: So far I've been under the impression that Simon already has the code for quorum_commit k = 1. What I'm opposing to is the timeout feature, which I consider to be additional code, unneeded complexity and foot-gun. Additional code? Yes. Foot-gun? Yes. Timeout should

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Wed, Oct 6, 2010 at 6:11 PM, Markus Wanner mar...@bluegap.ch wrote: Yeah, sounds more likely. Then I'm surprised that I didn't find any warning that the Protocol C definitely reduces availability (with the ko-count=0 default, that is). Really? I don't think that ko-count=0 means

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Wed, Oct 6, 2010 at 6:00 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: In general, salvaging the WAL that was not sent to the standby yet is outright impossible. You can't achieve zero data loss with asynchronous replication at all. No. That depends on the type of

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Robert Haas
On Thu, Oct 7, 2010 at 10:24 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Oct 6, 2010 at 6:00 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: In general, salvaging the WAL that was not sent to the standby yet is outright impossible. You can't achieve zero data loss

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Wed, Oct 6, 2010 at 9:22 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: From my experience operating londiste, those states would be:  1. base-backup  — self explaining  2. catch-up     — getting the WAL to catch up after base backup  3. wanna-sync   — don't yet have all the WAL to get

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Thu, Oct 7, 2010 at 5:01 AM, Simon Riggs si...@2ndquadrant.com wrote: You seem willing to trade anything for that guarantee. I seek a more pragmatic approach that balances availability and risk. Those views are different, but not inconsistent. Oracle manages to offer multiple options and

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Thu, Oct 7, 2010 at 3:01 AM, Markus Wanner mar...@bluegap.ch wrote: Of course, it doesn't make sense to wait-forever on *every* standby that ever gets added. Quorum commit is required, yes (and that's what this thread is about, IIRC). But with quorum commit, adding a standby only improves

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Joshua D. Drake
On Thu, 2010-10-07 at 19:44 -0400, Greg Smith wrote: I don't see this as needing any implementation any more complicated than the usual way such timeouts are handled. Note how long you've been trying to reach the standby. Default to -1 for forever. And if you hit the timeout, mark the

Re: [HACKERS] Issues with Quorum Commit

2010-10-07 Thread Fujii Masao
On Fri, Oct 8, 2010 at 8:44 AM, Greg Smith g...@2ndquadrant.com wrote: Additional code?  Yes.  Foot-gun?  Yes.  Timeout should be disabled by default so that you get wait forever unless you ask for something different?  Probably.  Unneeded?  This is where we don't agree anymore.  The example

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas
On 06.10.2010 01:14, Josh Berkus wrote: Last I checked, our goal with synch standby was to increase availablity, not decrease it. No. Synchronous replication does not help with availability. It allows you to achieve zero data loss, ie. if the master dies, you are guaranteed that any

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas
On 06.10.2010 01:14, Josh Berkus wrote: You start a new one from the latest base backup and let it catch up? Possibly modifying the config file in the master to let it know about the new standby, if we go down that path. This part doesn't seem particularly hard to me. Agreed, not sure of the

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner
On 10/06/2010 04:31 AM, Simon Riggs wrote: That situation would require two things * First, you have set up async replication and you're not monitoring it properly. Shame on you. The way I read it, Jeff is complaining about the timeout you propose that effectively turns sync into async

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Fujii Masao
On Wed, Oct 6, 2010 at 10:52 AM, Jeff Davis pg...@j-davis.com wrote: I'm not sure I entirely understand. I was concerned about the case of a standby server being allowed to lag behind the rest by a large number of WAL records. That can't happen in the wait for all servers to apply case,

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner
On 10/06/2010 08:31 AM, Heikki Linnakangas wrote: On 06.10.2010 01:14, Josh Berkus wrote: Last I checked, our goal with synch standby was to increase availablity, not decrease it. No. Synchronous replication does not help with availability. It allows you to achieve zero data loss, ie. if

  1   2   >