On Mar 26, 2013, at 6:32 AM, Rainer Brestan 
<rainer.bres...@gmx.net<mailto:rainer.bres...@gmx.net>> wrote:


Hi Steve,
when Pacemaker does promotion, it has already selected a specific node to 
become master.
It is far too late in this state to try to update master scores.

But there is another problem with xlog in PostgreSQL.

According to some discussion on PostgreSQL mailing lists, not relevant xlog 
entries dont go into the xlog counter during redo and/or start. This is 
specially true for CHECKPOINT xlog records, where this situation can be easely 
reproduced.
This can lead to the situation, where the replication is up to date, but the 
slave shows an lower xlog value.
This issue was solved in 9.2.3, where wal receiver always counts the end of 
applied records.

We are currently testing with 9.2.3.  I'm using the functions 
http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html 
along with tweaking a function to get the replay_lag in bytes to have a more 
accurate measurement.

There is also a second boring issue. The timeline change is replicated to the 
slaves, but they do not save it anywhere. In case slave starts up again and do 
not have access to the WAL archive, it cannot start any more. This was also 
addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

After talking with one of the Postgres guys it was recommended that we look at 
an alternative solution to the built in trigger file that will make the master 
jump to a new timeline.  We are in place moving the recovery.conf to 
recovery.done via the resource agent and then restarting the the postgresql 
service on the "new" master so that it maintains the original timeline that the 
slaves will recognize.

For data replication, no matter if PostgreSQL or any other database, you have 
always two choices of work.
- Data consistency is the top most priority. Dont go in operation, unless 
everything fine.
- Availability is the top most priority. Always try to have at least one 
running instance, even if data might not be latest.

The current pgsql RA does quite a good job for the first choice.

It currently has some limitations.
- After switchover, no matter of manual/automatic, it needs some work from 
maintenance personnel.
- Some failure scenarios of fault series lead to a non existing master without 
manual work.
- Geo-redundant replication with multi-site cluster ticket system (booth) does 
not work.
- If availability or unattended work is the priority, it cannot be used out of 
the box.

But it has a very good structure to be extended for other needs.

And this is what i currently implement.
Extend the RA to support both choices of work and prepare it for a multi-site 
cluster ticket system.

Would you be willing to share your extended RA?  Also do you run a cluster with 
more then 2 nodes ?

v/r

STEVE



Regards, Rainer
Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" <andr...@hastexo.com<mailto:andr...@hastexo.com>>
An: pacemaker@oss.clusterlabs.org<mailto:pacemaker@oss.clusterlabs.org>
Betreff: Re: [Pacemaker] OCF Resource agent promote question
Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now


>
> 1. A quick way of obtaining a list of "Online" nodes in the cluster
> that a resource will be able to migrate to. I've accomplished it with
> some grep and see but its not pretty or fast.
>
> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
> p1.example.net<http://p1.example.net> 
> <http://p1.example.net<http://p1.example.net/>> 
> p2.example.net<http://p2.example.net>
> <http://p2.example.net<http://p2.example.net/>>
>
> real0m2.797s
> user0m0.084s
> sys0m0.024s
>
> Once I get a list of active/online nodes in the cluster my thinking was
> to use PSQL to get the current xlog location and lag or each of the
> remaining nodes and compare them. If the node has a greater log
> position and/or less lag it will be given a greater master preference.
>
> 2. How to force a monitor/probe before a promote is run on ALL nodes to
> make sure that the master preference is up to date before
> migrating/failing over the resource.
> - I was thinking that maybe during the promote call it could get the log
> location and lag from each of the nodes via an psql call ( like above)
> and then force the resource to a specific node. Is there a way to do
> this and does it sound like a sane idea ?
>
>
> The start of my RA is located here suggestions and comments 100%
> welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr
>
> v/r
>
> STEVE
>
>
> _______________________________________________
> Pacemaker mailing list: 
> Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>




_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
_______________________________________________
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to