On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs <si...@2ndquadrant.com> wrote: >> I am told, one of the very popular setups for DR is to have one >> local sync standby and one async (may be cascaded by the local sync). Since >> this new feature is more useful for DR because taking a fresh backup on a >> slower link is even more challenging, IMHO we should support such setups. > > ...which still doesn't make sense to me. Lets look at that in detail. > > Take 3 servers, A, B, C with A and B being linked by sync rep, and C > being safety standby at a distance. > > Either A or B is master, except in disaster. So if A is master, then B > would be the failover target. If A fails, then you want to failover to > B. Once B is the target, you want to failback to A as the master. C > needs to follow the new master, whichever it is. > > If you set up sync rep between A and B and this new mode between A and > C. When B becomes the master, you need to failback from B from A, but > you can't because the new mode applied between A and C only, so you > have to failback from C to A. So having the new mode not match with > sync rep means you are forcing people to failback using the slow link > in the common case.
It's true that in this scenario that doesn't really make sense, but I still think they are separate properties. You could certainly want synchronous replication without this new property, if you like the data-loss guarantees that sync rep provides but don't care about failback. You could also want this new property without synchronous replication, if you don't need the data-loss guarantees that sync rep provides but you do care about fast failback. I admit it seems unlikely that you would use both features but not target them at the same machines, although maybe: perhaps you have a sync standby and an async standby and want this new property with respect to both of them. In my admittedly limited experience, the use case for a lot of this technology is in the cloud. The general strategy seems to be: at the first sign of trouble, kill the offending instance and fail over. This can result in failing over pretty frequently, and needing it to be fast. There may be no real hardware problem; indeed, the failover may be precipitated by network conditions or overload of the physical host backing the virtual machine or any number of other nonphysical problems. I can see this being useful in that environment, even for async standbys. People can apparently tolerate a brief interruption while their primary gets killed off and connections are re-established with the new master, but they need the failover to be fast. The problem with the status quo is that, even if the first failover is fast, the second one isn't, because it has to wait behind rebuilding the original master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers