03.07.2013, 16:26, "Takatoshi MATSUO" <matsuo....@gmail.com>: > Hi Andrey > > 2013/7/3 Andrey Groshev <gre...@yandex.ru>: > >> 03.07.2013, 06:43, "Takatoshi MATSUO" <matsuo....@gmail.com>: >>> Hi Stefano >>> >>> 2013/7/2 Stefano Sasso <stesa...@gmail.com>: >>>> Hello folks, >>>> I have the following setup in mind, but I need some advice and one >>>> hint on >>>> how to realize a particular function. >>>> >>>> I have a N (>= 2) nodes cluster, with data storage on postgresql. >>>> I would like to manage postgres master-slave replication in this way: one >>>> node is the "master", one is the "slave", and the others are "standby" >>>> nodes. >>>> If the master fails, the slave becomes the master, and one of the standby >>>> becomes the slave. >>>> If the slave fails, one of the standby becomes the new slave. >>> Does "standby" mean that PostgreSQL is stopped ? >>> If Master doesn't have WAL files which new slave needs, >>> new slave can't connect master. >>> >>> How do you solve it ? >>> copy data or wal-archive on start automatically ? >>> It may cause timed-out if PostgreSQL has large database. >>>> If one of the "standby" fails, no problem :) >>>> I can correctly manage this configuration with ms and a custom script >>>> (using >>>> ocf:pacemaker:Stateful as example). If the cluster is already >>>> operational, >>>> the failover works fine. >>>> >>>> My problem is about cluster start-up: in fact, only the previous running >>>> master and slave own the most updated data; so I would like that the new >>>> master should be the "old master" (or, even, the old slave), and the new >>>> slave should be the "old slave" (but this one is not mandatory). The >>>> important thing is that the new master should have up-to-date data. >>>> This should happen even if the servers are booted up with some minutes of >>>> delay between them. (users are very stupid sometimes). >>> Latest pgsql RA embraces these ideas to manage replication. >>> >>> 1. First boot >>> RA compares data and promotes PostgreSQL which has latest data. >>> The number of comparison can be changed using xlog_check_count parameter. >>> If monitor interval is 10 sec and xlog_check_count is 360, RA can wait >>> 1 hour to promote :) >> But in this case, when master dies, election a new master will continue one >> hour too. >> Is that right? > > No, if slave's data is up to date, master changes slave's master-score. > So pacemaker stops master and promote slave immediately when master dies. >
Wait.... in function have_master_right. ....snip.... # get xlog locations of all nodes for node in ${NODE_LIST}; do output=`$CRM_ATTR_REBOOT -N "$node" -n \ "$PGSQL_XLOG_LOC_NAME" -G -q 2>/dev/null` ....snip.... if [ "$new" -ge "$OCF_RESKEY_xlog_check_count" ]; then newestXlog=`printf "$newfile\n" | sort -t " " -k 2,3 -r | \ head -1 | cut -d " " -f 2` if [ "$newestXlog" = "$mylocation" ]; then ocf_log info "I have a master right." $CRM_MASTER -v $PROMOTE_ME return 0 fi change_data_status "$NODENAME" "DISCONNECT" ocf_log info "I don't have correct master data." # reset counter rm -f ${XLOG_NOTE_FILE}.* printf "$newfile\n" > ${XLOG_NOTE_FILE}.0 fi return 1 } As I understand, check xlog on all nodes $OCF_RESKEY_xlog_check_count more times. And call this function from pgsql_replication_monitor - and she has in turn from pgsql_monitoring. That is, while "monitoring" will not be called again $OCF_RESKEY_xlog_check_count have_master..... not return true. I remember the entire structure of your code in memory :) Or am I wrong? >>> 2. Second boot >>> Master manages slave's data using attribute with "-l forever" option. >>> So RA can't start PostgreSQL, if the node has no latest data. >>>> My idea is the following: >>>> the MS resource is not started when the cluster comes up, but on startup >>>> there will only be one "arbitrator" resource (started on only one node). >>>> This resource reads from somewhere which was the previous master and the >>>> previous slave, and it wait up to 5 minutes to see if one of them comes >>>> up. >>>> In positive case, it forces the MS master resource to be run on that node >>>> (and start it); in negative case, if the wait timer expired, it start the >>>> master resource on a random node. >>>> >>>> Is that possible? How can avoid a single resource to start on cluster >>>> boot? >>>> Or, could you advise another way to do this setup? >>>> >>>> I hope I was clear, my english is not so good :) >>>> thank you so much, >>>> stefano >>>> >>>> -- >>>> Stefano Sasso >>>> http://stefano.dscnet.org/ >>> Regards, >>> Takatoshi MATSUO >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org