Hi, On Monday 21 December 2009 12:44:17 pm Dejan Muhamedagic wrote: > Hi, > > On Fri, Dec 18, 2009 at 03:44:11PM +0100, Sebastian Reitenbach wrote: > > Hi, > > > > I have a 4 node cluster, managing some XEN resouces. The XEN resources > > have location constrains defined, based on pingd. On each node, a pingd > > clone is running. XEN resources are only started, when the pingd is able > > to ping the ping node. The xen nodes also have a preferred and fallback > > location defined. The pingd resources have a timeout of 60 seconds > > defined. > > The cluster nodes run on SLES11, x86_64, with those rpms installed: > > heartbeat-3.0.0-33.2 > > pacemaker-1.0.5-4.1 > > libpacemaker3-1.0.5-4.1 > > pacemaker-mgmt-client-1.99.2-7.1 > > pacemaker-mgmt-1.99.2-7.1 > > openais-0.80.3-26.1 > > libopenais2-0.80.3-26.1 > > > > I want to switch to a redundant network layout, using spanning tree > > between the switches. In case of a spanning tree recalculation because of > > a path failure or whatever other reason, I don't want to have nodes > > declared as dead because they cannot send heartbeat at that time to each > > other. > > > > Therefore I tried to prepare pacemaker on the cluster nodes. > > I put the whole cluster in maintenance mode via the hb_gui. > > > > Then I reconfigured /etc/ha.d/ha.cf and defined deadtime 70 and initdead > > 100. Then I restarted heartbeat on each cluster node. I waited until all > > cluster members were marked green/online in the GUI again. Then I turned > > off the maintenance mode. > > All XEN resources were shut down immediately. > > Oops. > > > Then > > A sentence missing? > > > In the hb_gui, the pingd resources looked a bit "strange". After leaving > > the maintenance mode, only one pingd resource showed the description > > ocf.:pacemaker:pingd, in hb_gui under Management. They were green, and > > showed it running on ['<server>']. > > > > Then I tried to restart the XEN resources manually, but the cluster only > > tried to start them on one host, not on the preferred or fallback > > location. > > > > Then I shutted down heartbeat on all 4 cluster nodes again, and put back > > the old ha.cf file, with deadtime 15 and initdead 40. And restarted > > heartbeat. After the cluster was running, the pingd resources were also > > started up. And then after the 60 seconds, the ping attribute was set, > > and the XEN resources were started up on all hosts. > > > > I wonder about some things: > > 1. why three of the pingd resources had no description shown after > > leaving the maintenance mode. > > > > 2. why all XEN resources were shut down after leaving the maintenance > > mode. Here I have a theory: In maintenance mode, the pingd attribute did > > not got updated, and because heartbeat was restarted on each node, the > > attribute was not set. Therefore when leaving the maintenance mode, > > pacemaker decided to shut down the XEN resources, because the pingd > > attribute was not set. > > Sounds like a plausible explanation. > > > 3. Why the pingd attribute was not set immediately after pingd started > > up, and was able to ping the ping node. After the pingd was started, then > > it waited 60 seconds (the timeout value) to set the attribute so that > > then the XEN resources were able to start, due to their location > > constraint. > > > > 4. Maybe the answers to the other questions will answer this alaready: > > Why the cluster behaved that strange at all with the large timeout values > > set in ha.cf. > > > > I could also send a cluster-report in case it may help to figure out what > > was wrong here, I just did not wanted to send a large attachement to the > > list in the first place. > > Probably the best to open a bugzilla and attach there the report. > I guess that special care is necessary on setting resources to > the unmanaged mode in case there are constraints which depend on > pingd attributes. I'm just updating another cluster to SLES11, will try to reproduce the problem there, and create a bug report with hb_report attached.
thanks Sebastian > > Thanks, > > Dejan > > > regards, > > Sebastian > > > > _______________________________________________ > > Pacemaker mailing list > > Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker