Re: [Pacemaker] PGSQL resource promotion issue

Steven Bambling Thu, 04 Apr 2013 10:54:51 -0700

I've done some basic testing with your path and it looks to work like a charm.


I've also confirmed that upon loosing a sync hot standby postgresql will just 
wait forever for a sync hot standby to come back online to complete the 
transaction.  To the client is looks like a "stuck" connection.

v/r

STEVE


On Apr 4, 2013, at 8:08 AM, Takatoshi MATSUO <matsuo....@gmail.com> wrote:

> Hi Steven
> 
> I made a patch as a trial.
> https://github.com/t-matsuo/resource-agents/commit/bd3b587c6665c4f5eba0491b91f83965e601bb6b#heartbeat/pgsql
> 
> This patch never show "STREAMING|POTENTIAL".
> 
> Thanks,
> Takatoshi MATSUO
> 
> 2013/4/4 Takatoshi MATSUO <matsuo....@gmail.com>:
>> Hi Steven
>> 
>> Sorry for late reply
>> 
>> 2013/3/29 Steven Bambling <smbambl...@arin.net>:
>>> Taskatoshi/Rainer thanks so much for the quick responses and clarification.
>>> 
>>> In response to the rep_mode being set to sync.
>>> 
>>> If the master is running the monitor check as low as every 1s, then its 
>>> updating the nodes with the "new" master preference in the event that the 
>>> current synchronous replica couldn't be reached and the postgres service 
>>> then selected the next node in the synchronous_standby_names list to 
>>> perform they synchronous transaction with.
>>> 
>>> If you are doing multiple transactions a second then doesn't it become 
>>> possible for the postgres service to switch it synchronous replication 
>>> replica ( from node2 to node3 for instance ) and potentially fail ( though 
>>> I think the risk seems small ) before the monitor function is invoke to 
>>> update the master preference?
>>> 
>>> In this case you've committed a transaction(s) and reported it back to your 
>>> application that it was successful, but when the new master is promoted it 
>>> doesn't have the committed transactions because it is located on the other 
>>> replica  ( and the failed master ).  Basically you've lost these 
>>> transactions even though they were reported successful.
>> 
>> Yes !
>> I didn't consider this situation.
>> 
>>> 
>>> The only way I can see getting around this would be to compare the current 
>>> xlog locations on each of the remaining replicas, the promoting the one 
>>> that meets your business needs.
>>>        1. If you need to have greater data consistency.
>>>                - promote the node that has the furtherest log location even 
>>> IF they haven't been replayed and there is some "recovery" period.
>>> 
>>>        2. If you need to have greater "up time"
>>>                - promote the node that has the furtherest log location, 
>>> taking into account the replay lag
>>>                        - promote the node that has the furthest head or 
>>> near furthest ahead log location and the LESS replay lag.
>> 
>> How do slaves get "up time" ?
>> I think slaves can't know the replay lag.
>> 
>>> Does this even seem possible with a resource agent or is my thinking 
>>> totally off?
>> 
>> Method 1 and 2 may cause data loss.
>> If you can accept data loss, you use "rep_mode=async".
>> It's about the same as method 1.
>> 
>> 
>> How about refraining from switching synchronous replication replica to avoid
>> data loss to set one node into "synchronous_standby_names" ?
>> 
>> 
>> Thanks,
>> Takatoshi MATSUO
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] PGSQL resource promotion issue

Reply via email to