On Sun, 2011-01-02 at 08:08 -0600, Kevin Grittner wrote: > I think you're talking about different metrics, and you're both > right. With two servers configured in sync rep your chance of having > an available (running) server is 99.9992%. The chance that you know > that you have one that is totally up to date, with no lost > transactions is 99.9208%. The chance that you *actually* have > up-to-date data would be higher, but you'd have no way to be sure. > The 99.96% number is your certainty that you have a running server > with up-to-date data if only one machine is sync rep. > > It's a matter of whether your shop needs five nines of availability > or the highest probability of not losing data. You get to choose.
Thanks for those calculations. Do you agree that requiring response from 2 sync standbys, or locking up, gives us 94% server availability, but 99.9992% data durability? And that adding additional async servers would not increase the server availability of that cluster? Now lets look at what happens when we first start a standby: we do the base backup, configure the standby, it connects and then <wham> we cannot process any new transactions until the standby has caught up, which could well be hours on a big database. So if we don't have a processing mode that allows work to continue, how will we ever enable synchronous replication on a 24/7 database? How will we ever allow standbys to catch up if they drop out for a while? We should factor that into the availability calcs as well. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers