I've done some basic testing with your path and it looks to work like a charm.
I've also confirmed that upon loosing a sync hot standby postgresql will just wait forever for a sync hot standby to come back online to complete the transaction. To the client is looks like a "stuck" connection. v/r STEVE On Apr 4, 2013, at 8:08 AM, Takatoshi MATSUO <matsuo....@gmail.com> wrote: > Hi Steven > > I made a patch as a trial. > https://github.com/t-matsuo/resource-agents/commit/bd3b587c6665c4f5eba0491b91f83965e601bb6b#heartbeat/pgsql > > This patch never show "STREAMING|POTENTIAL". > > Thanks, > Takatoshi MATSUO > > 2013/4/4 Takatoshi MATSUO <matsuo....@gmail.com>: >> Hi Steven >> >> Sorry for late reply >> >> 2013/3/29 Steven Bambling <smbambl...@arin.net>: >>> Taskatoshi/Rainer thanks so much for the quick responses and clarification. >>> >>> In response to the rep_mode being set to sync. >>> >>> If the master is running the monitor check as low as every 1s, then its >>> updating the nodes with the "new" master preference in the event that the >>> current synchronous replica couldn't be reached and the postgres service >>> then selected the next node in the synchronous_standby_names list to >>> perform they synchronous transaction with. >>> >>> If you are doing multiple transactions a second then doesn't it become >>> possible for the postgres service to switch it synchronous replication >>> replica ( from node2 to node3 for instance ) and potentially fail ( though >>> I think the risk seems small ) before the monitor function is invoke to >>> update the master preference? >>> >>> In this case you've committed a transaction(s) and reported it back to your >>> application that it was successful, but when the new master is promoted it >>> doesn't have the committed transactions because it is located on the other >>> replica ( and the failed master ). Basically you've lost these >>> transactions even though they were reported successful. >> >> Yes ! >> I didn't consider this situation. >> >>> >>> The only way I can see getting around this would be to compare the current >>> xlog locations on each of the remaining replicas, the promoting the one >>> that meets your business needs. >>> 1. If you need to have greater data consistency. >>> - promote the node that has the furtherest log location even >>> IF they haven't been replayed and there is some "recovery" period. >>> >>> 2. If you need to have greater "up time" >>> - promote the node that has the furtherest log location, >>> taking into account the replay lag >>> - promote the node that has the furthest head or >>> near furthest ahead log location and the LESS replay lag. >> >> How do slaves get "up time" ? >> I think slaves can't know the replay lag. >> >>> Does this even seem possible with a resource agent or is my thinking >>> totally off? >> >> Method 1 and 2 may cause data loss. >> If you can accept data loss, you use "rep_mode=async". >> It's about the same as method 1. >> >> >> How about refraining from switching synchronous replication replica to avoid >> data loss to set one node into "synchronous_standby_names" ? >> >> >> Thanks, >> Takatoshi MATSUO > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org