On Mon, Nov 17, 2014 at 9:34 AM, Andrew Beekhof <and...@beekhof.net> wrote: > >> On 14 Nov 2014, at 10:57 pm, Dmitry Matveichev <d.matveic...@mfisoft.ru> >> wrote: >> >> Hello, >> >> We have a cluster configured via pacemaker+corosync+crm. The configuration >> is: >> >> node master >> node slave >> primitive HA-VIP1 IPaddr2 \ >> params ip=192.168.22.71 nic=bond0 \ >> op monitor interval=1s >> primitive HA-variator lsb: variator \ >> op monitor interval=1s \ >> meta migration-threshold=1 failure-timeout=1s >> group HA-Group HA-VIP1 HA-variator >> property cib-bootstrap-options: \ >> dc-version=1.1.10-14.el6-368c726 \ >> cluster-infrastructure="classic openais (with plugin)" \ > > General advice, don't use the plugin. See: > > http://blog.clusterlabs.org/blog/2013/pacemaker-and-rhel-6-dot-4/ > http://blog.clusterlabs.org/blog/2013/pacemaker-on-rhel6-dot-4/ > >> expected-quorum-votes=2 \ >> stonith-enabled=false \ >> no-quorum-policy=ignore \ >> last-lrm-refresh=1383871087 >> rsc_defaults rsc-options: \ >> resource-stickiness=100 >> >> Firstly I make the variator service down on the master node (actually I >> delete the service binary and kill the variator process, so the variator >> fails to restart). Resources very quickly move on the slave node as >> expected. Then I return the binary on the master and restart the variator >> service. Now I make the same stuff with binary and service on slave node. >> The crm status command quickly shows me HA-variator (lsb: variator): >> Stopped. But it take to much time (for us) before recourses are switched on >> the master node (around 1 min). > > I see what you mean: > > 2013-12-21T07:04:12.230827+04:00 master crmd[14267]: notice: > te_rsc_command: Initiating action 2: monitor HA-variator_monitor_1000 on > slave.mfisoft.ru > 2013-12-21T05:45:09+04:00 slave crmd[7086]: notice: process_lrm_event: > slave.mfisoft.ru-HA-variator_monitor_1000:106 [ variator.x is stopped\n ] > > (1 minute goes by) > > 2013-12-21T07:05:14.232029+04:00 master crmd[14267]: error: print_synapse: > [Action 2]: In-flight rsc op HA-variator_monitor_1000 on slave.mfisoft.ru > (priority: 0, waiting: none) > 2013-12-21T07:05:14.232102+04:00 master crmd[14267]: warning: > cib_action_update: rsc_op 2: HA-variator_monitor_1000 on slave.mfisoft.ru > timed out >
Is it possible that pacemaker is confused by time difference on master and slave? > Is there a corosync log file configured? That would have more detail on > slave. > >> Then line >> Failed actions: >> HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1, >> status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', queued=0ms, >> exec=0ms >> appears in the crm status and recourses are switched. >> >> What is that timeout? Where I can change it? >> >> ------------------------ >> Kind regards, >> Dmitriy Matveichev. >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org