I'm reading the additions that you added to the pgsql resource agent to allow 
for streaming replication in Postgres 9.1+.  I'm trying to determine if your 
resource agent will compensate if node promoted ( new master ) does not have 
the newest data.

>From the looks of the pgsql_pre_promote function it seems that it will just 
>fail other replicas (slaves) that have newer data, but will continue with the 
>promotion of the new master even though it does not have the latest data.

If this is correct is there a way to force the promotion of the node with the 
newest data?

v/r

STEVE


On Mar 26, 2013, at 8:19 AM, Steven Bambling 
<smbambl...@arin.net<mailto:smbambl...@arin.net>> wrote:

Excellent thanks so much for the clarification.  I'll drop this new RA in and 
see if I can get things working.

STEVE


On Mar 26, 2013, at 7:38 AM, Rainer Brestan 
<rainer.bres...@gmx.net<mailto:rainer.bres...@gmx.net>>
 wrote:


Hi Steve,
pgsql RA does the same, it compares the last_xlog_replay_location of all nodes 
for master promotion.
Doing a promote as a restart instead of promote command to conserve timeline id 
is also on configurable option (restart_on_promote) of the current RA.
And the RA is definitely capable of having more than two instances. It goes 
through the parameter node_list and doing its actions for every member in the 
node list.
Originally it might be planned only to have only one slave, but the current 
implementation does not have this limitation. It has code for sync replication 
of more than two nodes, when some of them fall back into async to not promote 
them.

Of course, i will share the extension with the community, when they are ready 
for use. And the feature of having more than two instances is not removed. I am 
not running more than two instances on one site, current usage is to have two 
instances on one site and having two sites and manage master by booth. But it 
also under discussion to have more than two instances on one site, just to have 
no availability interruption in case of one server down and the other promote 
with restart.
The implementation is nearly finished, then begins the stress tests of failure 
scenarios.

Rainer
Gesendet: Dienstag, 26. März 2013 um 11:55 Uhr
Von: "Steven Bambling" <smbambl...@arin.net<mailto:smbambl...@arin.net>>
An: "The Pacemaker cluster resource manager" 
<pacemaker@oss.clusterlabs.org<mailto:pacemaker@oss.clusterlabs.org>>
Betreff: Re: [Pacemaker] OCF Resource agent promote question

On Mar 26, 2013, at 6:32 AM, Rainer Brestan 
<rainer.bres...@gmx.net<x-msg://211/rainer.bres...@gmx.net>> wrote:


Hi Steve,
when Pacemaker does promotion, it has already selected a specific node to 
become master.
It is far too late in this state to try to update master scores.

But there is another problem with xlog in PostgreSQL.

According to some discussion on PostgreSQL mailing lists, not relevant xlog 
entries dont go into the xlog counter during redo and/or start. This is 
specially true for CHECKPOINT xlog records, where this situation can be easely 
reproduced.
This can lead to the situation, where the replication is up to date, but the 
slave shows an lower xlog value.
This issue was solved in 9.2.3, where wal receiver always counts the end of 
applied records.

We are currently testing with 9.2.3.  I'm using the functions 
http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html 
along with tweaking a function to get the replay_lag in bytes to have a more 
accurate measurement.

There is also a second boring issue. The timeline change is replicated to the 
slaves, but they do not save it anywhere. In case slave starts up again and do 
not have access to the WAL archive, it cannot start any more. This was also 
addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

After talking with one of the Postgres guys it was recommended that we look at 
an alternative solution to the built in trigger file that will make the master 
jump to a new timeline.  We are in place moving the recovery.conf to 
recovery.done via the resource agent and then restarting the the postgresql 
service on the "new" master so that it maintains the original timeline that the 
slaves will recognize.

For data replication, no matter if PostgreSQL or any other database, you have 
always two choices of work.
- Data consistency is the top most priority. Dont go in operation, unless 
everything fine.
- Availability is the top most priority. Always try to have at least one 
running instance, even if data might not be latest.

The current pgsql RA does quite a good job for the first choice.

It currently has some limitations.
- After switchover, no matter of manual/automatic, it needs some work from 
maintenance personnel.
- Some failure scenarios of fault series lead to a non existing master without 
manual work.
- Geo-redundant replication with multi-site cluster ticket system (booth) does 
not work.
- If availability or unattended work is the priority, it cannot be used out of 
the box.

But it has a very good structure to be extended for other needs.

And this is what i currently implement.
Extend the RA to support both choices of work and prepare it for a multi-site 
cluster ticket system.

Would you be willing to share your extended RA?  Also do you run a cluster with 
more then 2 nodes ?

v/r

STEVE



Regards, Rainer
Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" <andr...@hastexo.com<x-msg://211/andr...@hastexo.com>>
An: pacemaker@oss.clusterlabs.org<x-msg://211/pacemaker@oss.clusterlabs.org>
Betreff: Re: [Pacemaker] OCF Resource agent promote question
Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now


>
> 1. A quick way of obtaining a list of "Online" nodes in the cluster
> that a resource will be able to migrate to. I've accomplished it with
> some grep and see but its not pretty or fast.
>
> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
> p1.example.net<http://p1.example.net/> 
> <http://p1.example.net<http://p1.example.net/>> 
> p2.example.net<http://p2.example.net/>
> <http://p2.example.net<http://p2.example.net/>>
>
> real0m2.797s
> user0m0.084s
> sys0m0.024s
>
> Once I get a list of active/online nodes in the cluster my thinking was
> to use PSQL to get the current xlog location and lag or each of the
> remaining nodes and compare them. If the node has a greater log
> position and/or less lag it will be given a greater master preference.
>
> 2. How to force a monitor/probe before a promote is run on ALL nodes to
> make sure that the master preference is up to date before
> migrating/failing over the resource.
> - I was thinking that maybe during the promote call it could get the log
> location and lag from each of the nodes via an psql call ( like above)
> and then force the resource to a specific node. Is there a way to do
> this and does it sound like a sane idea ?
>
>
> The start of my RA is located here suggestions and comments 100%
> welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr
>
> v/r
>
> STEVE
>
>
> _______________________________________________
> Pacemaker mailing list: 
> Pacemaker@oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>




_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to