Re: [HACKERS] Standalone synchronous master
At this point I feel that this new functionality might be a bit overkill for postgres, maybe it's better to stay lean and mean rather than add a controversial feature like this. I also agree that a more general replication timeout variable would be more useful to a larger audience but that would in my view add more complexity to the replication code which is quite simple and understandable right now ... Anyway, my backup plan was to achieve the same thing by triggering on the logging produced on the primary server and switch to async mode when detecting that the standby replication link has failed (and then back again when it is restored). In effect I would put a replication monitor on the outside of the server instead of embedding it. /A -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Okay, Here’s version 3 then, which piggy-backs on the existing flag : synchronous_commit = on | off | local | fallback Where “fallback” now means “fall back from sync replication when no (suitable) standbys are connected”. This was done on input from Guillaume Lelarge. > That said, I agree it's not necessarily reasonable to try to defend > against that in a two node cluster. That’s what I’ve been trying to say all along but I didn’t give enough context before so I understand we took a turn there. You can always walk up to any setup and say “hey, if you nuke that site from orbit and crash that other thing, and ...” ;) I’m just kidding of course but you get the point. Nothing is absolute. And so we get back to the three likelihoods in our two-node setup : 1.The master fails - Okay, promote the standby 2.The standby fails - Okay, the system still works but you no longer have data redundancy. Deal with it. 3.Both fail, together or one after the other. I’ve stated that 1 and 2 together covers way more than 99.9% of what’s expected in my setup on any given day. But 3. is what we’ve been talking about ... And well in that case there is no reason to just go ahead and promote a standby because, granted, it could be lagging behind if the master decided to switch to standalone mode just before going down itself. As long as you do not prematurely or rather instinctively promote the standby when it has *possibly* lagged behind, you’re good and there is no risk of data loss. The data might be sitting on a crashed or otherwise unavailable master, but it’s not lost. Promoting the standby however is basically saying “forget the master and its data, continue from where the standby is currently at”. Now granted this is operationally harder/more complicated than just synchronous replication where you can always, in any case, just promote the standby after a master failure, knowing that all data is guaranteed to be replicated. > I'm worried that the interface seems a bit > fragile and that it's hard to "be sure". With this setup, you can’t promote the standby without first checking if the replication link was disconnected prior to the master failure. For me, the benefits outweigh this one drawback because I get more standby replication guarantee than async replication and more master availability than sync replication in the most plausible outcomes. Cheers, /A sync-standalone-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Mon, Dec 26, 2011 at 5:18 PM, Guillaume Lelarge wrote: > On Mon, 2011-12-26 at 16:23 +0100, Magnus Hagander wrote: >> On Mon, Dec 26, 2011 at 15:59, Alexander Björnhagen >> wrote: >> >>>> Basically I like this whole idea, but I'd like to know why do you think >> >>>> this functionality is required? >> > >> >>> How should a synchronous master handle the situation where all >> >>> standbys have failed ? >> >>> >> >>> Well, I think this is one of those cases where you could argue either >> >>> way. Someone caring more about high availability of the system will >> >>> want to let the master continue and just raise an alert to the >> >>> operators. Someone looking for an absolute guarantee of data >> >>> replication will say otherwise. >> > >> >>If you don't care about the absolute guarantee of data, why not just >> >>use async replication? It's still going to replicate the data over to >> >>the client as quickly as it can - which in the end is the same level >> >>of guarantee that you get with this switch set, isn't it? >> > >> > This setup does still guarantee that if the master fails, then you can >> > still fail over to the standby without any possible data loss because >> > all data is synchronously replicated. >> >> Only if you didn't have a network hitch, or if your slave was down. >> >> Which basically means it doesn't *guarantee* it. >> > > It doesn't guarantee it, but it increases the master availability. Yes exactly. > That's the kind of customization some users would like to have. Though I > find it weird to introduce another GUC there. Why not add a new enum > value to synchronous_commit, such as local_only_if_slaves_unavailable > (yeah, the enum value is completely stupid, but you get my point). You are right an enum makes much more sense, and the patch would be much smaller as well so I’ll rework that bit. /A -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Hmm, I suppose this conversation would lend itself better to a whiteboard or a maybe over a few beers instead of via e-mail ... > Basically I like this whole idea, but I'd like to know why do you think > this functionality is required? How should a synchronous master handle the situation where all standbys have failed ? Well, I think this is one of those cases where you could argue either way. Someone caring more about high availability of the system will want to let the master continue and just raise an alert to the operators. Someone looking for an absolute guarantee of data replication will say otherwise. >>>If you don't care about the absolute guarantee of data, why not just >>>use async replication? It's still going to replicate the data over to >>>the client as quickly as it can - which in the end is the same level >>>of guarantee that you get with this switch set, isn't it? >> This setup does still guarantee that if the master fails, then you can >> still fail over to the standby without any possible data loss because >> all data is synchronously replicated. >Only if you didn't have a network hitch, or if your slave was down. >Which basically means it doesn't *guarantee* it. True. In my two-node system, I’m willing to take that risk when my only standby has failed. Most likely (compared to any other scenario), we can re-gain redundancy before another failure occurs. Say each one of your nodes can fail once a year. Most people have much better track record than with their production machines/network/etc but just as an example. Then on any given day there is a 0,27% chance that at given node will fail (1/365*100=0,27), right ? Then the probability of both failing on the same day is (0,27%)^2 = 0,74 % or about 1 in 13500. And given that it would take only a few hours tops to restore redundancy, it is even less of a chance than that because you would not be exposed for the entire day. So, to be a bit blunt about it and I hope I don’t come off a rude here, this dual-failure or creeping-doom type scenario on a two-node system is probably not relevant but more an academical question. >> I want to replicate data with synchronous guarantee to a disaster site >> *when possible*. If there is any chance that commits can be >> replicated, then I’d like to wait for that. >There's always a chance, it's just about how long you're willing to wait ;) Yes, exactly. When I can estimate it I’m willing to wait. >Another thought could be to have something like a "sync_wait_timeout", >saying "i'm willing to wait seconds for the syncrep to be caught >up. If nobody is cauth up within that time,then I can back down to >async mode/"standalone" mode". That way, data availaibility wouldn't >be affected by short-time network glitches. This was also mentioned in the previous thread I linked to, “replication_timeout“ : http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php In a HA environment you have redundant networking and bonded interfaces on each node. The only “glitch” would really be if a switch failed over and that’s a pretty big “if” right there. >> If however the disaster node/site/link just plain fails and >> replication goes down for an *indefinite* amount of time, then I want >> the primary node to continue operating, raise an alert and deal with >> that. Rather than have the whole system grind to a halt just because a >> standby node failed. >If the standby node failed and can be determined to actually be failed >(by say a cluster manager), you can always have your cluster software >(or DBA, of course) turn it off by editing the config setting and >reloading. Doing it that way you can actually *verify* that the site >is gone for an indefinite amount of time. The system might as well do this for me when the standby gets disconnected instead of halting the master. >> If we were just talking about network glitches then I would be fine >> with the current behavior because I do not believe they are >> long-lasting anyway and they are also *quantifiable* which is a huge >> bonus. >But the proposed switches doesn't actually make it possible to >differentiate between these "non-long-lasting" issues and long-lasting >ones, does it? We might want an interface that actually does... “replication_timeout” where the primary disconnects the WAL sender after a timeout together with “synchronous_standalone_master” which tells the primary it can continue anyway when that happens allows exactly that. This would then be first part towards that but I wanted to start out small and I personally think it is sufficient to draw the line at TCP disconnect of the standby. > When is the replication mode switched from "standalone" to "sync"? Good question. Currently that happens when a standby server has connected and also been deemed suitable for synchronous commit by the master ( meaning that its name matches the config variable >>
Re: [HACKERS] Standalone synchronous master
Interesting discussion! >>> Basically I like this whole idea, but I'd like to know why do you think >>> this functionality is required? >> How should a synchronous master handle the situation where all >> standbys have failed ? >> >> Well, I think this is one of those cases where you could argue either >> way. Someone caring more about high availability of the system will >> want to let the master continue and just raise an alert to the >> operators. Someone looking for an absolute guarantee of data >> replication will say otherwise. >If you don't care about the absolute guarantee of data, why not just >use async replication? It's still going to replicate the data over to >the client as quickly as it can - which in the end is the same level >of guarantee that you get with this switch set, isn't it? This setup does still guarantee that if the master fails, then you can still fail over to the standby without any possible data loss because all data is synchronously replicated. I want to replicate data with synchronous guarantee to a disaster site *when possible*. If there is any chance that commits can be replicated, then I’d like to wait for that. If however the disaster node/site/link just plain fails and replication goes down for an *indefinite* amount of time, then I want the primary node to continue operating, raise an alert and deal with that. Rather than have the whole system grind to a halt just because a standby node failed. It’s not so much that I don’t “care” about replication guarantee, then I’d just use asynchronous and be done with it. My point is that it is not always black and white and for some system setups you have to balance a few things against each other. If we were just talking about network glitches then I would be fine with the current behavior because I do not believe they are long-lasting anyway and they are also *quantifiable* which is a huge bonus. My primary focus is system availability but I also care about all that other stuff too. I want to have the cake and eat it at the same time as we say in Sweden ;) >>> When is the replication mode switched from "standalone" to "sync"? >> >> Good question. Currently that happens when a standby server has >> connected and also been deemed suitable for synchronous commit by the >> master ( meaning that its name matches the config variable >> synchronous_standby_names ). So in a setup with both synchronous and >> asynchronous standbys, the master only considers the synchronous ones >> when deciding on standalone mode. The asynchronous standbys are >> “useless” to a synchronous master anyway. >But wouldn't an async standby still be a lot better than no standby at >all (standalone)? As soon as the standby comes back online, I want to wait for it to sync. >>> The former might block the transactions for a long time until the standby >>> has caught up with the master even though synchronous_standalone_master is >>> enabled and a user wants to avoid such a downtime. > >> If we a talking about a network “glitch”, than the standby would take >> a few seconds/minutes to catch up (not hours!) which is acceptable if >> you ask me. >So it's not Ok to block the master when the standby goes away, but it >is ok to block it when it comes back and catches up? The goes away >might be the same amount of time - or even shorter, depending on >exactly how the network works.. To be honest I don’t have a very strong opinion here, we could go either way, I just wanted to keep this patch as small as possible to begin with. But again network glitches aren’t my primary concern in a HA system because the amount of data that the standby lags behind is possible to estimate and plan for. Typically switch convergence takes in the order of 15-30 seconds and I can thus typically assume that the restarted standby can recover that gap in less than a minute. So once upon a blue moon when something like that happens, commits would take up to say 1 minute longer. No big deal IMHO. >>> 1. While synchronous replication is running normally, replication >>> connection is closed because of >>>network outage. >>> 2. The master works standalone because of >>> synchronous_standalone_master=on and some >>>new transactions are committed though their WAL records are not >>> replicated to the standby. >>> 3. The master crashes for some reasons, the clusterware detects it and >>> triggers a failover. >>> 4. The standby which doesn't have recent committed transactions >>> becomes the master at a failover... >>> Is this scenario acceptable? >> So you have two separate failures in less time than an admin would >> have time to react and manually bring up a new standby. >Given that one is a network failure, and one is a node failure, I >don't see that being strange at all. For example, a HA network >environment might cause a short glitch when it's failing over to a >redundant node - enough to bring down the replication connection and >require it to reconnect (during
Re: [HACKERS] Standalone synchronous master
Hello and thank you for your feedback I appreciate it. Updated patch : sync-standalone-v2.patch I am sorry to be spamming the list but I just cleaned it up a little bit, wrote better comments and tried to move most of the logic into syncrep.c since that's where it belongs anyway and also fixed a small bug where standalone mode was disabled/enabled runtime via SIGHUP. > Basically I like this whole idea, but I'd like to know why do you think this > functionality is required? How should a synchronous master handle the situation where all standbys have failed ? Well, I think this is one of those cases where you could argue either way. Someone caring more about high availability of the system will want to let the master continue and just raise an alert to the operators. Someone looking for an absolute guarantee of data replication will say otherwise. I don’t like introducing config variables just for the fun of it, but I think in this case there is no right and wrong. Oracle dataguard replication has three different configurable modes called “performance/availability/protection” which for postgres corresponds exactly with “async/sync+standalone/sync”. > When is the replication mode switched from "standalone" to "sync"? Good question. Currently that happens when a standby server has connected and also been deemed suitable for synchronous commit by the master ( meaning that its name matches the config variable synchronous_standby_names ). So in a setup with both synchronous and asynchronous standbys, the master only considers the synchronous ones when deciding on standalone mode. The asynchronous standbys are “useless” to a synchronous master anyway. > The former might block the transactions for a long time until the standby has > caught up with the master even though synchronous_standalone_master is > enabled and a user wants to avoid such a downtime. If we a talking about a network “glitch”, than the standby would take a few seconds/minutes to catch up (not hours!) which is acceptable if you ask me. If we are talking about say a node failure, I suppose the workaround even on current code is to bring up the new standby first as asynchronous and then simply switch it to synchronous by editing synchronous_standby_names on the master. Voila ! :) So in effect this is a non-issue since there is a possible workaround, agree ? > 1. While synchronous replication is running normally, replication > connection is closed because of >network outage. > 2. The master works standalone because of > synchronous_standalone_master=on and some >new transactions are committed though their WAL records are not > replicated to the standby. > 3. The master crashes for some reasons, the clusterware detects it and > triggers a failover. > 4. The standby which doesn't have recent committed transactions becomes the master at a failover... > Is this scenario acceptable? So you have two separate failures in less time than an admin would have time to react and manually bring up a new standby. I’d argue that your system in not designed to be redundant enough if that kind of scenario worries you. But the point where it all goes wrong is where the ”clusterware” decides to fail over automatically. In that kind of setup synchronous_standalone_master must likely be off but again if the “clusterware” is not smart enough to take the right decision then it should not act at all. Better to just log critical alerts, send sms to people etc. Makes sense ? :) Cheers, /A sync-standalone-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Standalone synchronous master
Hi all, I’m new here so maybe someone else already has this in the works ? Anyway, proposed change/patch : Add a new parameter : synchronous_standalone_master = on | off To control whether a master configured with synchronous_commit = on is allowed to stop waiting for standby WAL sync when all synchronous standby WAL senders are disconnected. Current behavior is that the master waits indefinitely until a synchronous standby becomes available or until synchronous_commit is disabled manually. This would still be the default, so synchronous_standalone_master defaults to off. Previously discussed here : http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php I’m attaching a working patch against master/HEAD and I hope the spirit of christmas will make you see kindly on my attempt :) or something ... It works fine and I added some extra logging so that it would be possible to follow more easily from an admins point of view. It looks like this when starting the primary server with synchronous_standalone_master = on : $ ./postgres LOG: database system was shut down at 2011-12-25 20:27:13 CET <-- No standby is connected at startup LOG: not waiting for standby synchronization LOG: autovacuum launcher started LOG: database system is ready to accept connections <-- First sync standby connects here so switch to sync mode LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver on the standby is killed (SIGKILL) LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization <-- restart standby so that it connects again LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver is first stopped (SIGSTOP) to make sure we have outstanding waits in the primary, then killed (SIGKILL) LOG: could not receive data from client: Connection reset by peer LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization <-- client now finally receives commit ACK that was hanging due to the SIGSTOP:ed wal receiver on the standby node And so on ... any comments are welcome :) Thanks and cheers, /A diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 0cc3296..6367dcc 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2182,6 +2182,24 @@ SET ENABLE_SEQSCAN TO OFF; + + synchronous_standalone_master (boolean) + + synchronous_standalone_master configuration parameter + + + + Specifies how the master behaves when + is set to on and is configured but no +appropriate standby servers are currently connected. If enabled, the master will +continue processing transactions alone. If disabled, all the transactions on the +master are blocked until a synchronous standby has appeared. + + The default is disabled. + + + + diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index e9ae1e8..706af88 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -353,6 +353,8 @@ CheckpointerMain(void) /* Do this once before starting the loop, then just at SIGHUP time. */ SyncRepUpdateSyncStandbysDefined(); + SyncRepUpdateSyncStandaloneAllowed(); + SyncRepCheckIfStandaloneMaster(); /* * Loop forever @@ -382,6 +384,7 @@ CheckpointerMain(void) ProcessConfigFile(PGC_SIGHUP); /* update global shmem state for sync rep */ SyncRepUpdateSyncStandbysDefined(); + SyncRepUpdateSyncStandaloneAllowed(); } if (checkpoint_requested) { @@ -658,6 +661,7 @@ CheckpointWriteDelay(int flags, double progress) ProcessConfigFile(PGC_SIGHUP); /* update global shmem state for sync rep */ SyncRepUpdateSyncStandbysDefined(); + SyncRepUpdateSyncStandaloneAllowed(); } AbsorbFsyncRequests(); diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index 95de6c7..fd3e782 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -59,6 +59,8 @@ /* User-settable parameters for sync rep */ char *SyncRepStandbyNames; +bool SyncRepStandaloneMasterAllowed; + #define SyncStandbysDefined() \ (SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0') @@ -126,6 +128,20 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) return; } + + /* +* Fast exit also if no synchronous standby servers are presently connected +* and if the primary ser