On Mon, Dec 26, 2011 at 15:59, Alexander Björnhagen
<alex.bjornha...@gmail.com> wrote:
>>>> Basically I like this whole idea, but I'd like to know why do you think 
>>>> this functionality is required?
>>> How should a synchronous master handle the situation where all
>>> standbys have failed ?
>>> Well, I think this is one of those cases where you could argue either
>>> way. Someone caring more about high availability of the system will
>>> want to let the master continue and just raise an alert to the
>>> operators. Someone looking for an absolute guarantee of data
>>> replication will say otherwise.
>>If you don't care about the absolute guarantee of data, why not just
>>use async replication? It's still going to replicate the data over to
>>the client as quickly as it can - which in the end is the same level
>>of guarantee that you get with this switch set, isn't it?
> This setup does still guarantee that if the master fails, then you can
> still fail over to the standby without any possible data loss because
> all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

> I want to replicate data with synchronous guarantee to a disaster site
> *when possible*. If there is any chance that commits can be
> replicated, then I’d like to wait for that.

There's always a chance, it's just about how long you're willing to wait ;)

Another thought could be to have something like a "sync_wait_timeout",
saying "i'm willing to wait <n> seconds for the syncrep to be caught
up. If nobody is cauth up within that time,then I can back down to
async mode/"standalone" mode". That way, data availaibility wouldn't
be affected by short-time network glitches.

> If however the disaster node/site/link just plain fails and
> replication goes down for an *indefinite* amount of time, then I want
> the primary node to continue operating, raise an alert and deal with
> that. Rather than have the whole system grind to a halt just because a
> standby node failed.

If the standby node failed and can be determined to actually be failed
(by say a cluster manager), you can always have your cluster software
(or DBA, of course) turn it off by editing the config setting and
reloading. Doing it that way you can actually *verify* that the site
is gone for an indefinite amount of time.

> It’s not so much that I don’t “care” about replication guarantee, then
> I’d just use asynchronous and be done with it. My point is that it is
> not always black and white and for some system setups you have to
> balance a few things against each other.

Agreed in principle :-)

> If we were just talking about network glitches then I would be fine
> with the current behavior because I do not believe they are
> long-lasting anyway and they are also *quantifiable* which is a huge
> bonus.

But the proposed switches doesn't actually make it possible to
differentiate between these "non-long-lasting" issues and long-lasting
ones, does it? We might want an interface that actually does...

> My primary focus is system availability but I also care about all that
> other stuff too.
> I want to have the cake and eat it at the same time as we say in Sweden ;)

Of course - we all do :D

>>>> When is the replication mode switched from "standalone" to "sync"?
>>> Good question. Currently that happens when a standby server has
>>> connected and also been deemed suitable for synchronous commit by the
>>> master ( meaning that its name matches the config variable
>>> synchronous_standby_names ). So in a setup with both synchronous and
>>> asynchronous standbys, the master only considers the synchronous ones
>>> when deciding on standalone mode. The asynchronous standbys are
>>> “useless” to a synchronous master anyway.
>>But wouldn't an async standby still be a lot better than no standby at
>>all (standalone)?
> As soon as the standby comes back online, I want to wait for it to sync.

I guess I just find this very inconsistent. You're willing to wait,
but only sometimes. You're not willing to wait when it goes down, but
you are willing to wait when it comes back. I don't see why this
should be different, and I don't see how you can reliably
differentiate between these two.

>>>> The former might block the transactions for a long time until the standby 
>>>> has caught up with the master even though synchronous_standalone_master is 
>>>> enabled and a user wants to avoid such a downtime.
>>> If we a talking about a network “glitch”, than the standby would take
>>> a few seconds/minutes to catch up (not hours!) which is acceptable if
>>> you ask me.
>>So it's not Ok to block the master when the standby goes away, but it
>>is ok to block it when it comes back and catches up? The goes away
>>might be the same amount of time - or even shorter, depending on
>>exactly how the network works..
> To be honest I don’t have a very strong opinion here, we could go
> either way, I just wanted to keep this patch as small as possible to
> begin with. But again network glitches aren’t my primary concern in a
> HA system because the amount of data that the standby lags behind is
> possible to estimate and plan for.
> Typically switch convergence takes in the order of 15-30 seconds and I
> can thus typically assume that the restarted standby can recover that
> gap in less than a minute. So once upon a blue moon when something
> like that happens, commits would take up to say 1 minute longer. No
> big deal IMHO.

What about the slave rebooting, for example? That'll usually be pretty
quick too - so you'd be ok waiting for that. But your patch doesn't
let you wait for that - it will switch to standalone mode right away?
But if it takes 30 seconds to reboot, and then 30 seconds to catch up,
you are *not* willing to wait for the first 30 seconds, but you 'are*
willing fo wait for the second? Just seems strange to me, I guess...

>>>> 1. While synchronous replication is running normally, replication
>>>> connection is closed because of
>>>>    network outage.
>>>> 2. The master works standalone because of
>>>> synchronous_standalone_master=on and some
>>>>    new transactions are committed though their WAL records are not
>>>> replicated to the standby.
>>>> 3. The master crashes for some reasons, the clusterware detects it and
>>>> triggers a failover.
>>>> 4. The standby which doesn't have recent committed transactions
>>>> becomes the master at a failover...
>>>> Is this scenario acceptable?
>>> So you have two separate failures in less time than an admin would
>>> have time to react and manually bring up a new standby.
>>Given that one is a network failure, and one is a node failure, I
>>don't see that being strange at all. For example, a HA network
>>environment might cause a short glitch when it's failing over to a
>>redundant node - enough to bring down the replication connection and
>>require it to reconnect (during which the master would be ahead of the
>>In fact, both might well be network failures - one just making the
>>master completely inaccessble, and thus triggering the need for a
> You still have two failures on a two-node system.

Yes - but only one (or zero) of them is actually to any of the nodes :-)

> If we are talking about a setup with only two nodes (which I am), then
> I think it’s fair to limit the discussion to one failure (whatever
> that might be! node,switch,disk,site,intra-site link, power, etc ...).
> And in that case, there are only really three likely scenarios :
> 1)      The master fails
> 2)      The standby fails
> 3)      Both fail (due to shared network gear, power, etc)
> Yes there might be a need to failover and Yes the standby could
> possibly have lagged behind the master but with my sync+standalone
> mode, you reduce the risk of that compared to just async mode.
> So decrease the risk of data loss (case 1), increase system
> availability/uptime (case 2).
> That is a actually a pretty good description of my goal here :)
> Cheers,
> /A

 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to